Browsed by
Tag: python

How to leverage a private modules in your YAML Pipelines

How to leverage a private modules in your YAML Pipelines

I’ve made no secret about my love of DevOps, and to be honest, over the past few months it’s been more apparent to me than ever before that these practices are what makes developers more productive. And taking the time to setup these processes correctly are extremely valuable and will pay significant dividends over the life of the project.

Now that being said, I’ve also been doing a lot of work with Python, and honestly I’m really enjoying it. It’s one of those languages that is fairly easy to pickup but the options and opportunities based on it’s flexibility take longer to master. But one of the things I’m thankful we started doing was leveraging python modules to empower our code re-use.

The ability to leverage pip to install modules into containers creates this amazing ability to separate the business logic from the compute implementation.

To that end, there’s a pretty common problem, that I’m surprised is not more documented. And that’s if you’ve built python modules, and deployed them to a private artifact feed, how can you pull those same modules into a docker container to be used.

Step 1 – Create a Personal Access Token

The first part of this is creating a personal access token in ADO, which you can find instructions for here. The key to this though is the PAT must have access to the packages section, and I recommend read access.

Step 2 – Update DockerFile to accept an argument

Next we need to update our Dockerfile to be able to accept an argument so that we can pass that url. You’ll need to build out the url your going to use by doing the following:

https://{PAT}@pkgs.dev.azure.com/{organization}/{project id}/_packaging/{feed name}/pypi/simple/

Step 3 – Update YAML file to pass argument

This is done by adding the following to your docker file:

ARG feed_url=""
RUN pip install --upgrade pip
RUN pip install -r requirements.txt --index-url="${feed_url}"

The above will provide the ability to pass the url required for accessing the private feed into the process of building a docker image. This can be done in the YAML file by using the following:

- task: Bash@3
  inputs:
    targetType: 'inline'
    script: 'docker build -t="$(container_registry_name)/$(image_name):latest" -f="./DockerFile" . --build-arg feed_url="$(feed_url)"'
    workingDirectory: '$(Agent.BuildDirectory)'
  displayName: "Build Docker Image"

At this point, you can create your requirements file with all the appropriate packages and it will build when you run your automated build for your container image.

Passing Values to a Orchestrator function in durable functions

Passing Values to a Orchestrator function in durable functions

So I wanted to get a short post out here on a handy little trick that I actually had a pretty hard time finding the answer to. So I’ve been doing work with Durable functions lately in Python, and really enjoy the possible options this opens up for longer running or complex server-less operations.

That being said, I did run into a headache early on in the form of how do I pass values from the initial request to the other parts of the function. So here’s a short post on how to do that so that hopefully the next person doesn’t have as hard of a time finding the answer.

So when you have a durable function, you essentially get 3+ pieces, but at the minimum its 3 pieces.

  • HttpStart: This is the code that receives all Http requests for durable functions and then acts as a router, to push them to the correct Orchestrator.
  • Orchestrator: This is the function that will be longer running and used to manage state ongoing.
  • Activity: These are the smaller functions that are called upon by the orchestrator to do the work being asked for.

Now the problem is that normally I would pull from an http request in a function in the following way:

async def main(req: func.HttpRequest, starter: str) -> func.HttpResponse:
    payload = json.loads(req.get_body().decode())
    logging.info(f"Trigger payload = {payload}")

See though the problem with this is that the actual request that kicked off the function is only accessible to the HttpStart. So now, you need to figure out a way to pass this to the orchestrator, and if I’m being honest, the best way to do that isn’t clear.

Part of the reason it isn’t clear, is that Durable functions do a lot of work to encapsulate the inner workings of how they manage state. Which I’m thankful for, but it made this harder.

But after much head-scratching, if I figured it out, and here it is:

async def main(req: func.HttpRequest, starter: str) -> func.HttpResponse:
    client = df.DurableOrchestrationClient(starter)

    function_name = req.route_params["functionName"]
    requestBody = json.loads(req.get_body().decode())
    
    instance_id = await client.start_new(function_name, client_input=requestBody)

Now when you use core tools, all the wiring is pre-built for you. Strongly recommend that. But if you leverage the “client_input” parameter off the DurableOrchestrationClient, you can pass a json serialized paylod over to the orchestrator.

From the orchestrator’s side, the code looks like the following:

requestBody : str = context.get_input()

Now one of the limitations I found was that you could only pass one parameter this way, so the easy workaround was to implement a complex object that I would then serialize containing all relevant data.

Building modules in Python

Building modules in Python

So recently I’ve been involved in a lot more development work as a result of changing direction in my career. And honestly it’s been really interesting. Most recently I’ve been doing a lot of work with Python, which up until now was a language that I was familiar with but hadn’t done much with.

And I have to tell you, I’ve really been enjoying it. Python really is a powerful language in it’s flexibility, and I’ve been doing a lot of work with building out scripts to do a variety of tasks.

As mentioned in previous blog posts, I’m a big fan of DevOps and one of the things I try to embrace quickly is the idea of packaging code to maximize re-use. And to that end, I recently took a step back and went through how to build python modules, and how to package them up for using a pip install.

How to structure your Python Modules?

The first thing I’ve learned about Python is that it very much focused on the idea of convention. And what I mean by that is that Python focuses on the idea of using convention to define how things are done over have a rigid structure that requires configuration. And putting together these kinds of modules is no different.

So when you go through the setting up of the configuration, you would use the following structure:

  • {Project Folder}
    • {Module Folder}
    • __init__.py
    • {Module Code}
  • setup.py
  • requirements.txt

From there the next important part of the setup is to create a setup.py. As you start to flush this out, you are going to be identifying all the details of your package here along with dependencies that you want pip to resolve. A great post on project structure is here.

import setuptools 
  
with open("README.md", "r") as fh: 
    long_description = fh.read() 
  
setuptools.setup( 
    name="kevin_module", 
    version="1.0.0", 
    python_requires = '>=3.7',
    packages = setuptools.find_packages(),
    author="...", 
    author_email="...", 
    description="...", 
    long_description=long_description, 
    long_description_content_type="text/markdown", 
    license="MIT", 
) 

And next is the requirement for a readme.md, which will outline the specifics of your package. This where you are going to put the documentation.

Next the important part is how to implement the __init__.py, this is important and is basically a manifest for the namespace of the library. Below is the sample.

from .{python code file} import {class}

From there you can actually use a utility called twine to bundle the package, and information on twine can be found here. Below is the command to create the bundle. There is a great post on that found here.

Threading in Python

Threading in Python

Just another quick tip and trick for Python, and that is how you implement threading. This is surprisingly simple in Python, but basically it involves installing the threading library using pip

pip install threading

From there the process is the following:

numThreads = 10
threadList = []

def RunTest(self, name):
	print(f"Hello {name}")

for x in range(numThreads):
	t = threading.Thread(target=RunTest, args=(x))
	t.start()
	threadList.append(t)

This will then kick off each thread and let them run in the background, now the next logical question being how do I wait for a thread to complete, and that would be the following:

for t in threadList:
    t.join()

That’s really it, overall pretty easy.

Quickly Deserializing in Python

Quickly Deserializing in Python

Just a quick “Tip and Trick” post right now, but I’ve been doing a lot of work with python lately. And I have to tell you, I’m really enjoying it. Python provides a lot of easy ways to do the kind of work that we have to do so often. The “Free form” nature of python gives the ability to build a lot of things very quickly, and one of the common problems I wanted to solve is pulling in configuration information into your python scripts.

One of the habits I’ve built over my time as a developer is that I HATE to build magic strings, and prefer to build configuration settings as I build my code. And rely upon this configuration almost immediately. And as I’ve been working with python, I found an easy way to handle this type of pattern.

The first benefit, is how Python handles json, when Python reads in a json script, it will treat it as a type “dict” which is great because it makes it really easy to read as a key / value pair. So for example, the following code works:

Below is the test.json:

{
	"value1":"value 2",
	"object1": {
		"objectvalue1":"test"
	}
}

The following python script can read that:

filePath = "./test.json"
f = open(filePath)
config = json.load(f)
f.close()

value1 = config["value1"]

Honestly pretty easy, but one of the fun tricks of python I recently discovered is how to map those values to a class. Take the following file for “test.json”

{
	"value1":"value 2",
	"object1": {
		"objectvalue1":"test1"
		"objectvalue2":"test2"
	}
}

And if I create the following class:

class Object1(object):
	def __init__(self, objectvalue1, objectvalue2):
		self.objectvalue1 = objectvalue1
		self.objectvalue2 = objectvalue2

I can load that class with the following:

filePath = "./test.json"
f = open(filePath)
config = json.load(f)
f.close()

object1 = Object1(**config["object1"])

That one line at the end will take the json object from the “object1” property and will match it to the constructor to load the object, and do it based on name mapping, so as long as the constructor parameters match the names of the json file it will take care of the mapping.

So just a fun little trick for python for anyone who’s new to python.

Updating version numbers for Python Packages in Azure DevOps

Updating version numbers for Python Packages in Azure DevOps

So I did a previous post on how to create package libraries in Python, and I wanted to put in a post here on how to solve a problem I immediately identified.

If you look at the setup.py, you will see that the version number, and other details are very much hard coded into the file. This is concerning as it requires a manual step to go and update this before you can do a build / publish activity. And honestly, nowadays CI/CD is the way of the word.

So to resolve this, I built a script to have the automated build agent inject the version number created by the CI/CD tool. And that code is the following:

import fileinput
import sys

filename = sys.argv[1]
text_to_search = sys.argv[2]
replacement_text = sys.argv[3]
with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
    for line in file:
        print(line.replace(text_to_search, replacement_text), end='')

I then updated my setup.py with the following:

name="packageName", 
    version="{{__BuildNumber__}}", 
    python_requires = '>=3.7',
    description="{{__BuildReason__}}", 

And that’s it, from their you just trigger this tile and inject the new build number into the file.

Weekly Links – 3/23

Weekly Links – 3/23

So we’ve all been home for a week now, and honestly most of the world is still grappling with working from home. And having made that transition over the past 2 years, I know how big a transition. And I honestly hope everyone is taking time to step back and be healthy both physically and mentally. Otherwise we end up here:

See the source image

Down to business…

Fun Stuff:

So before the craziness of COVID-19, we got the opportunity to take the family to see Onward, the new Pixar movie. And honestly this was amazing, a great story with great performances. The story is timeless, the relationship between brothers and figuring out who you are by figuring out where you came from is amazing. The story of the relationship between a son and his father hit me right in the feels. A truly amazing movie, and honestly if you have dry eyes at the end, I question if you are a human or a Cylon.

Leveraging Azure Search with Python

Leveraging Azure Search with Python

So lately I’ve been working on a side project, to showcase some of the capabilities in Azure with regard to PaaS services, and the one I’ve become the most engaged with is Azure Search.

So let’s start with the obvious question, what is Azure Search? Azure Search is a Platform-as-a-Service offering that allows for implementing search as part of your cloud solution in a scalable manner.

Below are some links on the basics of “What is Azure Search?”

The first part is how to create a search service, and really I find the easiest way is to create it via CLI:

az search service create --name {name} --resource-group {group} --location {location}

So after you create an Azure Search Service, the next part is to create all the pieces needed. For this, I’ve been doing work with the REST API via Python to manage these elements, so you will see that code here.

  • Create the data source
  • Create an index
  • Create an indexer
  • Run the Indexer
  • Get the indexer status
  • Run the Search

Project Description:

For this post, I’m building a search index that crawls through the data compiled from the Chicago Data Portal, which makes statistics and public information available via their API. This solution is pulling in data from that API into cosmos db to make that information searchable. I am using only publicly consumable information as part of this. The information on the portal can be found here.

Create the Data Source

So, the first part of any search discussion, is that you need to have a data source that you can search. Can’t get far without that. So the question becomes, what do you want to search. Azure Search supports a wide variety of data sources, and for the purposes of this discussion, I am pointing it at Cosmos Db. The intention is to search the contents of a cosmos db to ensure that I can pull back relevant entries.

Below is the code that I used to create the data source for the search:

import json
import requests
from pprint import pprint

#The url of your search service
url = 'https://[Url of the search service]/datasources?api-version=2017-11-11'
print(url)

#The API Key for your search service
api_key = '[api key for the search service]'


headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

data = {
    'name': 'cosmos-crime',
    'type': 'documentdb',
    'credentials': {'connectionString': '[connection string for cosmos db]'},
    'container': {'name': '[collection name]'}
}

data = json.dumps(data)
print(type(data))

response = requests.post(url, data=data, headers=headers)
pprint(response.status_code)

To get the API key, you need the management key which can be found with the following command:

az search admin-key show --service-name [name of the service] -g [name of the resource group]

After running the above you will have created a data source to connect to for searching.

Create an Index

Once you have the above datasource, the next step is to create an index. This index is what Azure Search will map your data to, and how it will actually perform searches in the future. So ultimately think of this as the format your search will be in after completion. To create the index, use the following code:

import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexes?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

data = {
     "name": "crimes",  
     "fields": [
       {"name": "id", "type": "Edm.String", "key":"true", "searchable": "false"},
       {"name": "iucr","type": "Edm.String", "searchable":"true", "filterable":"true", "facetable":"true"},
       {"name": "location_description","type":"Edm.String", "searchable":"true", "filterable":"true"},
       {"name": "primary_description","type":"Edm.String","searchable":"true","filterable":"true"},
       {"name": "secondary_description","type":"Edm.String","searchable":"true","filterable":"true"},
       {"name": "arrest","type":"Edm.String","facetable":"true","filterable":"true"},
       {"name": "beat","type":"Edm.Double","filterable":"true","facetable":"true"},
       {"name": "block", "type":"Edm.String","filterable":"true","searchable":"true","facetable":"true"},
       {"name": "case","type":"Edm.String","searchable":"true"},
       {"name": "date_occurrence","type":"Edm.DateTimeOffset","filterable":"true"},
       {"name": "domestic","type":"Edm.String","filterable":"true","facetable":"true"},
       {"name": "fbi_cd", "type":"Edm.String","filterable":"true"},
       {"name": "ward","type":"Edm.Double", "filterable":"true","facetable":"true"},
       {"name": "location","type":"Edm.GeographyPoint"}
      ]
     }

data = json.dumps(data)
print(type(data))

response = requests.post(url, data=data, headers=headers)
pprint(response.status_code)

Using the above code, I’ve identified the different data types of the final product, and these all map to the data types identified for azure search. The supported data types can be found here.

Its worth mentioning, that there are other key attributes above to consider:

  • facetable: This denotes if this data is able to be faceted. For example, in Yelp if I bring back a search for cost, all restuarants have a “$” to “$$$$$” rating, and I want to be able to group results based on this facet.
  • filterable: This denotes if the dataset can be filtered based on those values.
  • searchable: This denotes whether or not the field is having a full-text search performed on it, and is limited in the different types of data that can used to perform the search.

Creating an indexer

So the next step is to create the indexer. The purpose of the indexer is that this does the real work. The indexer is responsible for performing the following operations:

  • Connect to the data source
  • Pull in the data and put it into the appropriate format for the index
  • Perform any data transformations
  • Manage pulling in no data ongoing
import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexers?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

data = {
    "name": "cosmos-crime-indexer",
    "dataSourceName": "cosmos-crime",
    "targetIndexName": "crimes",
    "schedule": {"interval": "PT2H"},
    "fieldMappings": [
        {"sourceFieldName": "iucr", "targetFieldName": "iucr"},
        {"sourceFieldName": "location_description", "targetFieldName": "location_description"},
        {"sourceFieldName": "primary_decsription", "targetFieldName": "primary_description"},
        {"sourceFieldName": "secondary_description", "targetFieldName": "secondary_description"},
        {"sourceFieldName": "arrest", "targetFieldName": "arrest"},
        {"sourceFieldName": "beat", "targetFieldName": "beat"},
        {"sourceFieldName": "block", "targetFieldName": "block"},
        {"sourceFieldName": "casenumber", "targetFieldName": "case"},
        {"sourceFieldName": "date_of_occurrence", "targetFieldName": "date_occurrence"},
        {"sourceFieldName": "domestic", "targetFieldName": "domestic"},
        {"sourceFieldName": "fbi_cd", "targetFieldName": "fbi_cd"},
        {"sourceFieldName": "ward", "targetFieldName": "ward"},
        {"sourceFieldName": "location", "targetFieldName":"location"}
    ]
}

data = json.dumps(data)
print(type(data))

response = requests.post(url, data=data, headers=headers)
pprint(response.status_code)

What you will notice is that for each field, two attributes are assigned:

  • targetFieldName: This is the field in the index that you are targeting.
  • sourceFieldName: This is the field name according to the data source.

Run the indexer

Once you’ve created the indexer, the next step is to run it. This will cause indexer to pull data into the index:

import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexers/cosmos-crime-indexer/run/?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

reseturl = 'https://[Url of the search service]/indexers/cosmos-crime-indexer/reset/?api-version=2017-11-11'

resetResponse = requests.post(reseturl, headers=headers)

response = requests.post(url, headers=headers)
pprint(response.status_code)

By triggering the “running” the indexer which will load the index.

Getting the indexer status

Now, depending the size of your data source, this indexing process could take time, so I wanted to provide a rest call that will let you get the status of the indexer.

import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexers/cosmos-crime-indexer/status/?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

response = requests.get(url, headers=headers)
index_list = response.json()
pprint(index_list)

This will provide you with the status of the indexer, so that you can find out when it completes.

Run the search

Finally if you want to confirm the search is working afterward, you can do the following:

import json
import requests
from pprint import pprint

url = 'https://[Url of the search service]/indexes/crimes/docs?api-version=2017-11-11'
print(url)

api_key = '[api key for the search service]'

headers = {
    'Content-Type': 'application/json',
    'api-key': api_key
}

response = requests.get(url, headers=headers)
index_list = response.json()
pprint(index_list)

This will bring back the results of the search. This will bring back everything as it is an empty string search.

I hope this helps with your configuring of Azure Search, Happy searching :)!