Browsed by
Category: Programming

Updating version numbers for Python Packages in Azure DevOps

Updating version numbers for Python Packages in Azure DevOps

So I did a previous post on how to create package libraries in Python, and I wanted to put in a post here on how to solve a problem I immediately identified.

If you look at the setup.py, you will see that the version number, and other details are very much hard coded into the file. This is concerning as it requires a manual step to go and update this before you can do a build / publish activity. And honestly, nowadays CI/CD is the way of the word.

So to resolve this, I built a script to have the automated build agent inject the version number created by the CI/CD tool. And that code is the following:

import fileinput
import sys

filename = sys.argv[1]
text_to_search = sys.argv[2]
replacement_text = sys.argv[3]
with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
    for line in file:
        print(line.replace(text_to_search, replacement_text), end='')

I then updated my setup.py with the following:

name="packageName", 
    version="{{__BuildNumber__}}", 
    python_requires = '>=3.7',
    description="{{__BuildReason__}}", 

And that’s it, from their you just trigger this tile and inject the new build number into the file.

Enabling Remote State with Terraform

Enabling Remote State with Terraform

So I’ve made no secret of my love of TerraForm. Honestly, I really like TerraForm for using infrastructure as code. Now one of the features that I really like about TerraForm, is the ability to execute a plan and see what’s going to change.

What is state in Terraform?

Terraform leverages using state to enable the ability to have the “plan/apply” functionality. Which makes it such that you can see the changes before they are applied.

How do we enable remote state?

So the process of enabling state remotely isn’t necessarily hard, and is requires a simple piece of code. For my projects, I add a “Terraform.tf”, that contains this information. NOTE: I do usually add this to the gitignore, so that I’m not checking in the keys:

terraform {
    backend "azurerm" {
        resource_group_name  = "..."
        storage_account_name = "..."
        container_name = "..."
        key = "..."
    }
}

It really is that simple, and the key part of this is it becomes very important if you are working with more than one person on deploying to the same environment. In that scenario, if you have two developers using local state, then your state can become out of sync. But this is an easy way to make sure that you manage state in a way that allows collaboration.

A simple trick to handling environments in Terraform

A simple trick to handling environments in Terraform

So for a short post, I wanted to share a good habit to get into with TerraForm. More specifically this is an easy way to handle the configuration and deployment of multiple environments and making it easier to manage in your Terraform scripts.

It doesn’t take long working with TerraForm to see the immediate value in leveraging it to build out brand new environments, but that being said it never fails to amaze me how many people I talk to, who don’t craft their templates to be highly reusable. There are lots of ways to do this, but I wanted to share a practice that I use.

The idea starts by leveraging this pattern. My projects all contain the following key files for “.tf”:

  • main.tf: This file contains the provider information, and maps up the service principal (if you are using one) to be used during deployment.
  • variables.tf: This file contains a list of all the variables leveraged in my solution, with a description for their definition.

The “main.tf” file is pretty basic:

provider "azurerm" {
    subscription_id = var.subscription_id
    version = "~> 2.1.0"

    client_id = var.client_id
    client_secret = var.client_secret
    tenant_id = var.tenant_id

    features {}
}

Notice that the above is already wired up for the variables of Subscription_id, client_id, client_secret, and tenant_id.

Now for my variables file, I have things like the following:

variable "subscription_id" {
    description = "The subscription being deployed."
}

variable "client_id" {
    description = "The client id of the service prinicpal"
}

variable "client_secret" {
    description = "The client secret for the service prinicpal"
}

variable "tenant_id" {
    description = "The client secret for the service prinicpal"
}

Now what this enables is the ability to then have a separate “.tfvars” file for each individual environment:

primarylocation = "..."
secondarylocation = "..."
subscription_id = "..."

client_id = "..."
client_secret = "..."
tenant_id = "..."

From here the process of creating the environment in TerraForm is as simple as:

terraform apply -var-file {EnvironmentName}.tfvars

And then for new environments all I have to do is create a new .tfvars file to contain the configuration for that environment. This enables me to manage the configuration for my environment locally.

NOTE: I usually recommend that you add “*.tfvars” to the gitignore, so that these files are not necessarily checked in. This prevents configuration from being checked into source control.

Another step this then makes relatively easy is the automated deployment, as I can add the following for a YAML task:

- script: |
    touch variables.tfvars
    echo -e "primarylocation = \""$PRIMARYLOCATION"\"" >> variables.tfvars
    echo -e "secondarylocation = \""$SECONDARYLOCATION"\"" >> variables.tfvars
    echo -e "subscription_id = \""$SUBSCRIPTION_ID"\"" >> variables.tfvars
    echo -e "client_id = \""$SP_APPLICATIONID"\"" >> variables.tfvars
    echo -e "tenant_id = \""$SP_TENANTID"\"" >> variables.tfvars
    echo -e "client_secret = \""$SP_CLIENTSECRET"\"" >> variables.tfvars
  displayName: 'Create variables Tfvars'

The above script then takes the build variables for the individual environment, and builds the appropriate “.tfvars” file to run for that environment.

Now this is sort of the manual approach, ideally you would leverage keyvault or vault to access the necessary deployment variables.

Azure Search SDK in Government

Azure Search SDK in Government

So I’ve been working on a demo project using Azure Search, and if you’ve followed this blog for a while you know. I do a lot of work that requires Azure Government. Well recently I needed to implement a search that would be called via an Azure Function and require the passing of latitude and longitude to facilitate the searching within a specific distance. So I started to build my azure function using the SDK. And what I ended up with looked a lot like this:

Key Data elements:

First to be able to interact with my search service I need to install the following nuget package:

Microsoft.Azure.Search

And upon doing so, I found so pretty good documentation here for building the search client. So I built out a GeoSearchProvider class that looked like the following:

NOTE: I use a custom class called IConfigurationProvider which encapsulates my configuration store, in most cases its KeyVault, but it can be a variety of other options.

public class GeoSearchProvider : IGeoSearchProvider
    {
        IConfigurationProvider _configurationProvider;

        public GeoSearchProvider(IConfigurationProvider configurationProvider)
        {
            _configurationProvider = configurationProvider;
        }

        public async Task<DocumentSearchResult<SearchResultModel>> RunSearch(string text, string latitude, string longitude, string kmdistance, Microsoft.Extensions.Logging.ILogger log)
        {
            if (String.IsNullOrEmpty(kmdistance))
            {
                kmdistance = await _configurationProvider.GetSetting("SearchDefaultDistance");
            }

            var serviceName = await _configurationProvider.GetSetting("SearchServiceName");
            var serviceApiKey = await _configurationProvider.GetSetting("SearchServiceApiKey");
            var indexName = await _configurationProvider.GetSetting("SearchServiceIndex");

            SearchIndexClient indexClient = new SearchIndexClient(serviceName, indexName, new SearchCredentials(serviceApiKey));

            var parameters = new SearchParameters()
            {
                Select = new[] { "...{list of fields}..." },
                Filter = string.Format("geo.distance(location, geography'POINT({0} {1})') le {2}", latitude, longitude, kmdistance)
            };

            var logmessage = await _configurationProvider.GetSetting("SearchLogMessage");

            try
            {
                var results = await indexClient.Documents.SearchAsync<SearchResultModel>(text, parameters);

                log.LogInformation(string.Format(logmessage, text, latitude, longitude, kmdistance, results.Count.ToString()));

                return results;
            }
            catch (Exception ex)
            {
                log.LogError(ex.Message);
                log.LogError(ex.StackTrace);
                throw ex;
            }
        }
    }

The above code seems pretty straight forward and will run just fine to get back my search results. I even built in logic so that if I don’t give it a distance, it will take a default from the configuration store, pretty slick.

And I pretty quickly ran into a problem, and that error was “Host Not found”.

And I racked my brain on this for a while before I discovered the cause. By default, the Azure Search SDK, talks to Commercial. Not Azure Government, and after picking through the documentation I found this. There is a property called DnsSuffix, which allows you to put in the suffix used for finding the search service. By default it is “search.windows.net”. I changed my code to the following:

public class GeoSearchProvider : IGeoSearchProvider
    {
        IConfigurationProvider _configurationProvider;

        public GeoSearchProvider(IConfigurationProvider configurationProvider)
        {
            _configurationProvider = configurationProvider;
        }

        public async Task<DocumentSearchResult<SearchResultModel>> RunSearch(string text, string latitude, string longitude, string kmdistance, Microsoft.Extensions.Logging.ILogger log)
        {
            if (String.IsNullOrEmpty(kmdistance))
            {
                kmdistance = await _configurationProvider.GetSetting("SearchDefaultDistance");
            }

            var serviceName = await _configurationProvider.GetSetting("SearchServiceName");
            var serviceApiKey = await _configurationProvider.GetSetting("SearchServiceApiKey");
            var indexName = await _configurationProvider.GetSetting("SearchServiceIndex");
            var dnsSuffix = await _configurationProvider.GetSetting("SearchSearchDnsSuffix");

            SearchIndexClient indexClient = new SearchIndexClient(serviceName, indexName, new SearchCredentials(serviceApiKey));
            indexClient.SearchDnsSuffix = dnsSuffix;

            var parameters = new SearchParameters()
            {
                Select = new[] { "...{list of fields}..." },
                Filter = string.Format("geo.distance(location, geography'POINT({0} {1})') le {2}", latitude, longitude, kmdistance)
            };

            //TODO - Define sorting based on distance

            var logmessage = await _configurationProvider.GetSetting("SearchLogMessage");

            try
            {
                var results = await indexClient.Documents.SearchAsync<SearchResultModel>(text, parameters);

                log.LogInformation(string.Format(logmessage, text, latitude, longitude, kmdistance, results.Count.ToString()));

                return results;
            }
            catch (Exception ex)
            {
                log.LogError(ex.Message);
                log.LogError(ex.StackTrace);
                throw ex;
            }
        }
    }

And set the “SearchSearchDnsSuffix” to “search.azure.us” for government, and it all immediately worked.

How to learn TerraForm

How to learn TerraForm

So as should surprise no-one, I’ve been doing a lot of work with TerraForm lately, and I’m a huge fan of it in general. Recently doing a post talking about the basics of modules. (which can be found here).

But one common question I’ve gotten a lot of is how to go about Learning TerraForm. Where do I start? So I wanted to do a post gathering some education resources to help.

First for the what is TerraForm, TerraForm is an open source product, created by HashiCorp which enables infrastructure-as-code, specifically designed to be cloud vendor agnostic. If you want to learn the basics, I recommend this video I did with Steve Michelotti about TerraForm and Azure Government:

But more than that, the question becomes how do I go about learning TerraForm. The first part is configuring your machine, and for that you can find a blog post I did here. There are somethings you need to do to setup your environment for terraform, and without any guidance it can be confusing.

But once you know what TerraForm is, the question becomes, how do I learn about / how to use it?

Outside of these, what I recommend is using the module registry, so one of the biggest strengths of TerraForm is a public module repository that allows you to see re-usable code written by others. I highly recommend this as a great way to see working code and play around with it. Here’s the public module registry.

So that’s a list of some resources to get you started on learning TerraForm, obviously there are also classes by PluralSight, Udemy, and Lynda. But I’ve not leveraged those, if you are a fan of structured class settings, those would be good places to start.

Working With Modules in Terraform

Working With Modules in Terraform

I’ve done a bunch of posts on TerraForm, and there seems to be a bigger and bigger demand for it. If you follow this blog at all, you know that I am a huge supporter of TerraForm, and the underlying idea of Infrastructure-as-code. The value-prop of which I think is essential to any organization that wants to leverage the cloud.

Now that being said, it won’t take long after you start working with TerraForm, before you stumble across the concept of Modules. And it also won’t take long before you see the value of those modules as well.

So the purpose of this post is to walk you through creating your first module, and give you an idea of how to do this benefit you.

So what is a module? A module in TerraForm is a way of creating smaller re-usable components that can help to make management of your infrastructure significantly easier. So let’s take for example, a basic TerraForm template. The following will generate a single VM in a Virtual Network.

provider "azurerm" {
  subscription_id = "...."
}

resource "azurerm_resource_group" "rg" {
  name     = "SingleVM"
  location = "eastus"

  tags {
    environment = "Terraform Demo"
  }
}

resource "azurerm_virtual_network" "vnet" {
  name                = "singlevm-vnet"
  address_space       = ["10.0.0.0/16"]
  location            = "eastus"
  resource_group_name = "${azurerm_resource_group.rg.name}"

  tags {
    environment = "Terraform Demo"
  }
}

resource "azurerm_subnet" "vnet-subnet" {
  name                 = "default"
  resource_group_name  = "${azurerm_resource_group.rg.name}"
  virtual_network_name = "${azurerm_virtual_network.vnet.name}"
  address_prefix       = "10.0.2.0/24"
}

resource "azurerm_public_ip" "pip" {
  name                = "vm-pip"
  location            = "eastus"
  resource_group_name = "${azurerm_resource_group.rg.name}"
  allocation_method   = "Dynamic"

  tags {
    environment = "Terraform Demo"
  }
}

resource "azurerm_network_security_group" "nsg" {
  name                = "vm-nsg"
  location            = "eastus"
  resource_group_name = "${azurerm_resource_group.rg.name}"
}

resource "azurerm_network_security_rule" "ssh-access" {
  name                        = "ssh"
  priority                    = 100
  direction                   = "Outbound"
  access                      = "Allow"
  protocol                    = "Tcp"
  source_port_range           = "*"
  destination_port_range      = "*"
  source_address_prefix       = "*"
  destination_address_prefix  = "*"
  destination_port_range      = "22"
  resource_group_name         = "${azurerm_resource_group.rg.name}"
  network_security_group_name = "${azurerm_network_security_group.nsg.name}"
}

resource "azurerm_network_interface" "nic" {
  name                      = "vm-nic"
  location                  = "eastus"
  resource_group_name       = "${azurerm_resource_group.rg.name}"
  network_security_group_id = "${azurerm_network_security_group.nsg.id}"

  ip_configuration {
    name                          = "myNicConfiguration"
    subnet_id                     = "${azurerm_subnet.vnet-subnet.id}"
    private_ip_address_allocation = "dynamic"
    public_ip_address_id          = "${azurerm_public_ip.pip.id}"
  }

  tags {
    environment = "Terraform Demo"
  }
}

resource "random_id" "randomId" {
  keepers = {
    # Generate a new ID only when a new resource group is defined
    resource_group = "${azurerm_resource_group.rg.name}"
  }

  byte_length = 8
}

resource "azurerm_storage_account" "stgacct" {
  name                     = "diag${random_id.randomId.hex}"
  resource_group_name      = "${azurerm_resource_group.rg.name}"
  location                 = "eastus"
  account_replication_type = "LRS"
  account_tier             = "Standard"

  tags {
    environment = "Terraform Demo"
  }
}

resource "azurerm_virtual_machine" "vm" {
  name                  = "singlevm"
  location              = "eastus"
  resource_group_name   = "${azurerm_resource_group.rg.name}"
  network_interface_ids = ["${azurerm_network_interface.nic.id}"]
  vm_size               = "Standard_DS1_v2"

  storage_os_disk {
    name              = "singlevm_os_disk"
    caching           = "ReadWrite"
    create_option     = "FromImage"
    managed_disk_type = "Premium_LRS"
  }

  storage_image_reference {
    publisher = "Canonical"
    offer     = "UbuntuServer"
    sku       = "16.04.0-LTS"
    version   = "latest"
  }

  os_profile {
    computer_name  = "singlevm"
    admin_username = "uadmin"
  }

  os_profile_linux_config {
    disable_password_authentication = true

    ssh_keys {
      path     = "/home/uadmin/.ssh/authorized_keys"
      key_data = "{your ssh key here}"
    }
  }

  boot_diagnostics {
    enabled     = "true"
    storage_uri = "${azurerm_storage_account.stgacct.primary_blob_endpoint}"
  }

  tags {
    environment = "Terraform Demo"
  }
}

Now that TerraForm script shouldn’t surprise anyone, but here’s the problem, what if I want to take that template and make it deploy 10 VMs instead of 1 in that virtual network.

Now I could take lines 64-90 and lines 103-147 (a total of 70 lines) and do some copy and pasting for the other 9 VMs, which would add 630 lines of code to my terraform template. Then manually make sure they are configured the same, and add the lines of code for the load balancer, which would probably be another 20-30….

If this hasn’t made you cringe, I give up.

The better approach would be to implement a module, so the question is, how do we do that. We start with our folder structure, I would recommend the following:

  • Project Folder
    • Modules
      • Network
      • VirtualMachine
      • LoadBalancer
  • main.tf
  • terraform.tfvars
  • secrets.tfvars

Now the idea here being, that we create a folder to contain all of our modules, and then a separate folder for each. Now when I was learning about modules, this tripped me up. You can’t have the “tf” files for your modules in the same directory, especially if they have any similar named parameters like “region”. If you put them in the same directory you will get errors about duplicate variables.

Now once you have your folders, what do we put in each of them, the answer is this…main.tf. I do this because it makes it easy to reference and track the core module in my code. Being a developer and devops fan, I firmly believe in consistency.

So what does that look like, below is the file I put in “Network\main.tf”

variable "address_space" {
    type = string
    default = "10.0.0.0/16"
}

variable "default_subnet_cidr" {
    type = string 
    default = "10.0.2.0/24"
}

variable "location" {
    type = string
}

resource "azurerm_resource_group" "basic_rig_network_rg" {
    name = "vm-Network"
    location = var.location
}

resource "azurerm_virtual_network" "basic_rig_vnet" {
    name                = "basic-vnet"
    address_space       = [var.address_space]
    location            = azurerm_resource_group.basic_rig_network_rg.location
    resource_group_name = azurerm_resource_group.basic_rig_network_rg.name
}

resource "azurerm_subnet" "basic_rig_subnet" {
 name                 = "basic-vnet-subnet"
 resource_group_name  = azurerm_resource_group.basic_rig_network_rg.name
 virtual_network_name = azurerm_virtual_network.basic_rig_vnet.name
 address_prefix       = var.default_subnet_cidr
}

output "name" {
    value = "BackendNetwork"
}

output "subnet_instance_id" {
    value = azurerm_subnet.basic_rig_subnet.id
}

output "networkrg_name" {
    value = azurerm_resource_group.basic_rig_network_rg.name
}

Now there are a couple of key elements, that I make use of here, and you’ll notice that there is a variables section, a TerraForm template, and an outputs section.

It’s important to remember that every TerraForm template is self contained, similar to how you scope parameters, you pass the values into the module and then use them accordingly. And by identifying the “Output” variables, I can pass things back to the main template.

Now the question becomes, what does that look like to implement it. When I go back to my root level “main.tf”, I find I can now leverage the following:

module "network" {
  source = "./modules/network"

  address_space = var.address_space
  default_subnet_cidr = var.default_subnet_cidr
  location = var.location
}

A couple of key elements to reference here, are that the “source” property points to the module folder that contains the main.tf. And then I am mapping variables at my environment level to the module. This allows for me to control what gets passed into each instance of the module. So this shows how to get module values into the module.

The next question is how do you get them out, in my root main.tf file, I would have code like the following:

network_subnet_id = module.network.subnet_instance_id

To reference it and interface with the underlying map, I would just reference, module.network.___________ and reference the appropriate output variable.

Now I want to be clear this is probably the most simplistic module I can think of, but it illustrates how to hit the ground running and create new modules, or even use existing modules in your code.

For more information, here’s a link to the HashiCorp learn site, and here is a link to the TerraForm module registry, which is a collection of prebuilt modules that you can leverage in your code.

Leveraging Azure Search with Python

Leveraging Azure Search with Python

So lately I’ve been working on a side project, to showcase some of the capabilities in Azure with regard to PaaS services, and the one I’ve become the most engaged with is Azure Search.

So let’s start with the obvious question, what is Azure Search? Azure Search is a Platform-as-a-Service offering that allows for implementing search as part of your cloud solution in a scalable manner.

Below are some links on the basics of “What is Azure Search?”

The first part is how to create a search service, and really I find the easiest way is to create it via CLI:

az search service create --name {name} --resource-group {group} --location {location}

So after you create an Azure Search Service, the next part is to create all the pieces needed. For this, I’ve been doing work with the REST API via Python to manage these elements, so you will see that code here.

  • Create the data source
  • Create an index
  • Create an indexer
  • Run the Indexer
  • Get the indexer status
  • Run the Search

Project Description:

For this post, I’m building a search index that crawls through the data compiled from the Chicago Data Portal, which makes statistics and public information available via their API. This solution is pulling in data from that API into cosmos db to make that information searchable. I am using only publicly consumable information as part of this. The information on the portal can be found here.

Create the Data Source

So, the first part of any search discussion, is that you need to have a data source that you can search. Can’t get far without that. So the question becomes, what do you want to search. Azure Search supports a wide variety of data sources, and for the purposes of this discussion, I am pointing it at Cosmos Db. The intention is to search the contents of a cosmos db to ensure that I can pull back relevant entries.

Below is the code that I used to create the data source for the search:

import json
import requests
from pprint import pprint

#The url of your search service
url = 'https://[Url of the search service]/datasources?api-version=2017-11-11'
print(url)

#The API Key for your search service
api_key = '[api key for the search service]'


headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

data = {
    'name': 'cosmos-crime',
    'type': 'documentdb',
    'credentials': {'connectionString': '[connection string for cosmos db]'},
    'container': {'name': '[collection name]'}
}

data = json.dumps(data)
print(type(data))

response = requests.post(url, data=data, headers=headers)
pprint(response.status_code)

To get the API key, you need the management key which can be found with the following command:

az search admin-key show --service-name [name of the service] -g [name of the resource group]

After running the above you will have created a data source to connect to for searching.

Create an Index

Once you have the above datasource, the next step is to create an index. This index is what Azure Search will map your data to, and how it will actually perform searches in the future. So ultimately think of this as the format your search will be in after completion. To create the index, use the following code:

import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexes?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

data = {
     "name": "crimes",  
     "fields": [
       {"name": "id", "type": "Edm.String", "key":"true", "searchable": "false"},
       {"name": "iucr","type": "Edm.String", "searchable":"true", "filterable":"true", "facetable":"true"},
       {"name": "location_description","type":"Edm.String", "searchable":"true", "filterable":"true"},
       {"name": "primary_description","type":"Edm.String","searchable":"true","filterable":"true"},
       {"name": "secondary_description","type":"Edm.String","searchable":"true","filterable":"true"},
       {"name": "arrest","type":"Edm.String","facetable":"true","filterable":"true"},
       {"name": "beat","type":"Edm.Double","filterable":"true","facetable":"true"},
       {"name": "block", "type":"Edm.String","filterable":"true","searchable":"true","facetable":"true"},
       {"name": "case","type":"Edm.String","searchable":"true"},
       {"name": "date_occurrence","type":"Edm.DateTimeOffset","filterable":"true"},
       {"name": "domestic","type":"Edm.String","filterable":"true","facetable":"true"},
       {"name": "fbi_cd", "type":"Edm.String","filterable":"true"},
       {"name": "ward","type":"Edm.Double", "filterable":"true","facetable":"true"},
       {"name": "location","type":"Edm.GeographyPoint"}
      ]
     }

data = json.dumps(data)
print(type(data))

response = requests.post(url, data=data, headers=headers)
pprint(response.status_code)

Using the above code, I’ve identified the different data types of the final product, and these all map to the data types identified for azure search. The supported data types can be found here.

Its worth mentioning, that there are other key attributes above to consider:

  • facetable: This denotes if this data is able to be faceted. For example, in Yelp if I bring back a search for cost, all restuarants have a “$” to “$$$$$” rating, and I want to be able to group results based on this facet.
  • filterable: This denotes if the dataset can be filtered based on those values.
  • searchable: This denotes whether or not the field is having a full-text search performed on it, and is limited in the different types of data that can used to perform the search.

Creating an indexer

So the next step is to create the indexer. The purpose of the indexer is that this does the real work. The indexer is responsible for performing the following operations:

  • Connect to the data source
  • Pull in the data and put it into the appropriate format for the index
  • Perform any data transformations
  • Manage pulling in no data ongoing
import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexers?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

data = {
    "name": "cosmos-crime-indexer",
    "dataSourceName": "cosmos-crime",
    "targetIndexName": "crimes",
    "schedule": {"interval": "PT2H"},
    "fieldMappings": [
        {"sourceFieldName": "iucr", "targetFieldName": "iucr"},
        {"sourceFieldName": "location_description", "targetFieldName": "location_description"},
        {"sourceFieldName": "primary_decsription", "targetFieldName": "primary_description"},
        {"sourceFieldName": "secondary_description", "targetFieldName": "secondary_description"},
        {"sourceFieldName": "arrest", "targetFieldName": "arrest"},
        {"sourceFieldName": "beat", "targetFieldName": "beat"},
        {"sourceFieldName": "block", "targetFieldName": "block"},
        {"sourceFieldName": "casenumber", "targetFieldName": "case"},
        {"sourceFieldName": "date_of_occurrence", "targetFieldName": "date_occurrence"},
        {"sourceFieldName": "domestic", "targetFieldName": "domestic"},
        {"sourceFieldName": "fbi_cd", "targetFieldName": "fbi_cd"},
        {"sourceFieldName": "ward", "targetFieldName": "ward"},
        {"sourceFieldName": "location", "targetFieldName":"location"}
    ]
}

data = json.dumps(data)
print(type(data))

response = requests.post(url, data=data, headers=headers)
pprint(response.status_code)

What you will notice is that for each field, two attributes are assigned:

  • targetFieldName: This is the field in the index that you are targeting.
  • sourceFieldName: This is the field name according to the data source.

Run the indexer

Once you’ve created the indexer, the next step is to run it. This will cause indexer to pull data into the index:

import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexers/cosmos-crime-indexer/run/?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

reseturl = 'https://[Url of the search service]/indexers/cosmos-crime-indexer/reset/?api-version=2017-11-11'

resetResponse = requests.post(reseturl, headers=headers)

response = requests.post(url, headers=headers)
pprint(response.status_code)

By triggering the “running” the indexer which will load the index.

Getting the indexer status

Now, depending the size of your data source, this indexing process could take time, so I wanted to provide a rest call that will let you get the status of the indexer.

import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexers/cosmos-crime-indexer/status/?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

response = requests.get(url, headers=headers)
index_list = response.json()
pprint(index_list)

This will provide you with the status of the indexer, so that you can find out when it completes.

Run the search

Finally if you want to confirm the search is working afterward, you can do the following:

import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexes/crimes/docs?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

response = requests.get(url, headers=headers)
index_list = response.json()
pprint(index_list)

This will bring back the results of the search. This will bring back everything as it is an empty string search.

I hope this helps with your configuring of Azure Search, Happy searching :)!

Getting Started with Azure (developer perspective)

Getting Started with Azure (developer perspective)

So there’s a common question I’ve been getting a lot lately, and that’s “I want to learn Azure, where do I start?” And this is ultimately a very reasonable question, because as much as the cloud has permuted much of the digital world, there are still some organizations who have only recently started to adopt it.

There are many reasons people would choose to adopt the cloud, scalability, cost, flexibility, etc. But for today’s post I’m going to focus on the idea that you have already decided to go to the Azure Cloud and are looking for resources to ramp up. So I wanted to provide those here:

MS Learn: The site provides videos, reading, and walk-through’s that can assist with learning this type of material:

MS Learn for Specific Services: There are several common services out there that many people think of when they think of the cloud, and I wanted to provide some resources here to help with those:

EDX Courses: EDX is a great site with a lot of well made courses, and there are a wealth of options for Azure and Cloud, here are a few I thought relevant, but it is not an exhaustive list.

  • Architecting Distributed Applications: One common mistake, that many make with regard to the cloud is that they think of it as “just another data center”, and that’s just not true. To build effective and scalable applications, they need to be architected to take advantage of distributed compute. This course does a great job of laying out how to make sure you are architected to work in a distributed fashion.
  • Microsoft Azure Storage: A great course on the basics of using Azure Storage.
  • Microsoft Azure Virtual Machines: The virtual machine is the cornerstone of azure, and provides many options to build an scale out effectively. This is a good introduction into the most basic service in Azure.
  • Microsoft Azure App Service: The most popular service in Azure, App Service enables developers to deploy and configure apps without worrying about the machine running under-the-covers. A great overview.
  • Microsoft Azure Virtual Networks: As I mentioned above, Software Based Networking is one of the key pieces required for the cloud and this gives a good introduction into how to leverage it.
  • Databases in Azure: Another key component of the cloud is the Database, and this talks about the options for leveraging platform-as-a-service offerings for databases to eliminate your overhead for maintaining the vms.
  • Azure Security and Compliance: A key component again is security, as the digital threats are constantly evolving, and Azure provides a lot of tools to protect your workload, this is an essential piece of every architecture.
  • Building your azure skills toolkit: A good beginner course for how to get your skills up to speed with Azure.

Additional Tools and Resources, I would recommend the following:

Those are just some of the many resources that can be helpful to starting out with Azure and learning to build applications for the cloud. It is not an exhaustive list, so if you have a resource you’ve found helpful, please post it in the comments below.

Building a Solr Cluster with TerraForm – Part 1

Building a Solr Cluster with TerraForm – Part 1

So it’s no surprise that I very much have been talking about how amazing TerraForm is, and recently I’ve been doing a lot of investigation into Solr and how to build a scalable Solr Cluster.

So given the kubernetes template I wanted to try my hand at something similar. The goals of this project were the following:

  1. Build a generic template for creating a Solr cloud cluster with distributed shard.
  2. Build out the ability to scale the cluster for now using TerraForm to manually trigger increases to cluster size.
  3. Make the nodes automatically add themselves to the cluster.

And I could do this just using bash scripts and packer. But instead wanted to try my hand at cloud init.

But that’s going to be the end result, I wanted to walkthrough the various steps I go through to get to the end.  The first real step is to get through the installation of Solr on  linux machines to be implemented. 

So let’s start with “What is Solr?”   The answer is that Solr is an open source software solution that provides a means of creating a search engine.  It works in the same vein as ElasticSearch and other technologies.  Solr has been around for quite a while and is used by some of the largest companies that implement search to handle search requests by their customers.  Some of those names are Netflix and CareerBuilder.  See the following links below:

So I’ve decided to try my hand at this and creating my first Solr cluster, and have reviewed the getting started. 

So I ended up looking into it more, and built out the following script to create a “getting started” solr cluster.

sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
sudo apt-get install -y gnupg-curl
sudo wget https://www.apache.org/dist/lucene/solr/8.0.0/solr-8.0.0.zip.asc | sudo apt-key add

sudo apt-get update -y
sudo apt-get install unzip
sudo wget http://mirror.cogentco.com/pub/apache/lucene/solr/8.0.0/solr-8.0.0.zip

sudo unzip -q solr-8.0.0.zipls
sudo mv solr-8.0.0 /usr/local/bin/solr-8.0.0 -f
sudo rm solr-8.0.0.zip -f

sudo apt-get install -y default-jdk

sudo chmod +x /usr/local/bin/solr-8.0.0/bin/solr
sudo chmod +x /usr/local/bin/solr-8.0.0/example/cloud/node1/solr
sudo chmod +x /usr/local/bin/solr-8.0.0/example/cloud/node2/solr
sudo /usr/local/bin/solr-8.0.0/bin/bin/solr -e cloud -noprompt

The above will configure a “getting started solr cluster” that leverages all the system defaults and is hardly a production implementation. So my next step will be to change this. But for the sake of getting something running, I took the above script and moved it into a packer template using the following json. The above script is the “../scripts/Solr/provision.sh”

{
  "variables": {
    "deployment_code": "",
    "resource_group": "",
    "subscription_id": "",
    "location": "",
    "cloud_environment_name": "Public"
  },
  "builders": [{   
    "type": "azure-arm",
    "cloud_environment_name": "{{user `cloud_environment_name`}}",
    "subscription_id": "{{user `subscription_id`}}",

    "managed_image_resource_group_name": "{{user `resource_group`}}",
    "managed_image_name": "Ubuntu_16.04_{{isotime \"2006_01_02_15_04\"}}",
    "managed_image_storage_account_type": "Premium_LRS",

    "os_type": "Linux",
    "image_publisher": "Canonical",
    "image_offer": "UbuntuServer",
    "image_sku": "16.04-LTS",

    "location": "{{user `location`}}",
    "vm_size": "Standard_F2s"
  }],
  "provisioners": [
    {
      "type": "shell",
      "script": "../scripts/ubuntu/update.sh"
    },
    {
      "type": "shell",
      "script": "../scripts/Solr/provision.sh"
    },
    {
      "execute_command": "chmod +x {{ .Path }}; {{ .Vars }} sudo -E sh '{{ .Path }}'",
      "inline": [
        "/usr/sbin/waagent -force -deprovision+user &amp;&amp; export HISTSIZE=0 &amp;&amp; sync"
      ],
      "inline_shebang": "/bin/sh -e",
      "type": "shell"
    }]
}

The only other script mentioned is the “update.sh”, which has the following logic in it, to install the cli and update the ubuntu image:

#! /bin/bash

sudo apt-get update -y
sudo apt-get upgrade -y

#Azure-CLI
AZ_REPO=$(sudo lsb_release -cs)
sudo echo "deb [arch=amd64] https://packages.microsoft.com/repos/azure-cli/ $AZ_REPO main" | sudo tee /etc/apt/sources.list.d/azure-cli.list
sudo curl -L https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
sudo apt-get install apt-transport-https
sudo apt-get update &amp;&amp; sudo apt-get install azure-cli

So the above gets me to a good place for being able to create an image with it configured.

For next steps I will be doing the following:

  • Building a more “production friendly” implementation of Solr into the script.
  • Investigating leveraging cloud init instead of the “golden image” experience with Packer.
  • Building out templates around the use of Zookeeper for managing the nodes.


Configuring Terraform Development Environment

Configuring Terraform Development Environment

So I’ve been doing a lot of work with a set of open source tools lately, specifically TerraForm and Packer. TerraForm at its core is a method of implementing truly Infrastructure as Code, and does so by providing a simple function style language where you can create basic implementations for the cloud, and then leverage resource providers to deploy. These resource providers allow you to deploy to variety of cloud platforms (the full list can be found here). It also provides robust support for debugging, targeting, and supports a desired state configuration approach that makes it much easier to maintain your environments in the cloud.

Now that being said, like most open source tools, it can require some configuration for your local development environment and I wanted to put this post together to describe it. Below are the steps to configuring your environment.

Step 1: Install Windows SubSystem on your Windows 10 Machine

To start with, you will need to be able to leverage bash as part of the Linux Subsystem. You can enable this on a Windows 10 machine, by following the steps outlined in this guide:

https://docs.microsoft.com/en-us/windows/wsl/install-win10

Once you’ve completed this step, you will be able to move forward with VS Code and the other components required.

Step 2: Install VS Code and Terraform Plugins

For this guide we recommend VS Code as your editor, VS code works on a variety of operating systems, and is a very light-weight code editor.

You can download VS Code from this link:

https://code.visualstudio.com/download

Once you’ve downloaded and installed VS code, we need to install the VS Code Extension for Terraform.

Then click “Install” and “Reload” when completed. This will allow you to have intelli-sense and support for the different terraform file types.

Step 3: Opening Terminal

You can then perform the remaining steps from the VS Code application. Go to the “View” menu and select “integrated terminal”. You will see the terminal appear at the bottom:

By default, the terminal is set to “powershell”, type in “Bash” to switch to Bash Scripting. You can default your shell following this guidance – https://code.visualstudio.com/docs/editor/integrated-terminal#_configuration

Step 4: Install Unzip on Subsystem

Run the following command to install “unzip” on your linux subsystem, this will be required to unzip both terraform and packer.

sudo apt-get install unzip

Step 5: Install TerraForm

You will need to execute the following commands to download and install Terraform, we need to start by getting the latest version of terraform.

Go to this link:

https://www.terraform.io/downloads.html

And copy the link for the appropriate version of the binaries for TerraForm.

Go back to VS Code, and enter the following commands:

wget {url for terraform}
unzip {terraform.zip file name}
sudo mv terraform /usr/local/bin/terraform
rm {terraform.zip file name}
terraform --version

Step 6: Install Packer

To start with, we need to get the most recent version of packer. Go to the following Url, and copy the url of the appropriate version.

https://www.packer.io/downloads.html

Go back to VS Code and execute the following commands:

wget {packer url} 
unzip {packer.zip file name} 
sudo mv packer /usr/local/bin/packer
rm {packer.zip file name}

Step 7: Install Azure CLI 2.0

Go back to VS Code again, and download / install azure CLI. To do so, execute the steps and commands found here:

https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-apt?view=azure-cli-latest

Step 8: Authenticating against Azure

Once this is done you are in a place where you can run terraform projects, but before you do, you need to authenticate against Azure. This can be done by running the following commands in the bash terminal (see link below):

https://docs.microsoft.com/en-us/azure/azure-government/documentation-government-get-started-connect-with-cli

Once that is completed, you will be authenticated against Azure and will be able to update the documentation for the various environments.

NOTE: Your authentication token will expire, should you get a message about an expired token, enter the command, to refresh:

az account get-access-token 

Token lifetimes can be described here – https://docs.microsoft.com/en-us/azure/active-directory/develop/active-directory-token-and-claims#access-tokens

After that you are ready to use Terraform on your local machine.