Music for reading(spotify)

In my last post, I talked about the first two steps of a project I recently undertook to automate a painfully repetitive task. To summarize, I needed to search the NPI Registry for all the practices, physicians, physician assistants, and nurse practitioners in a zip code and save it as a CSV. Last time, I talked about creating the connection to the NPI Registry’s API and submitting the query to it. Let’s go ahead and get into how to deal with the results the API returns. The source code can be found in my GitHub.

Step 3: Process the API’s Response

This part actually happens in two parts. First get the results and put them in the format I need them to be in. But there’s a problem, of course.

The NPI Registry API will only return a maximum of 200 results per call and only 10 results by default. If you want the next 200 results you have to issue the next query telling it to skip the first 200 results. If you want the third set of 200 results, you need to skip the first 400 results and so on. To complicate matters further, the response contains the number of results returned but not how many results there are in the Registry.

What’s a code monkey to do? Well, enter the method below. There’s a few things happening here so we’ll take it bit by bit. Also, I’ll be the first to tell you that the following methods have a bit too much responsibility and could do with some good, old-fashioned refactoring but are completely functional for my purpose.


def get_results
    
    @result_count = 0

    @results = []

    json_response = self.query

    while true

      if json_response['result_count'] == 0
        self.skip= nil
        break
      end

      @result_count += json_response['result_count']

      @results << json_response['results']

      self.skip= @result_count

      json_response = self.query

      print "#"

    end

    puts "\nFound #{@result_count} results."
    return @results

  end

There are a few things the get_results method does right up top. It creates the @result_count and @results instance variables. @result_count is a simple instance variable that is assigned to the integer 0. @results is created as an empty array. We’ll talk about both of these again in a minute. The other thing it does is call self.query and assign the hash is returns to the json_response variable. See the last post for the “dets” on the self.query method call.

You’ll notice that the condition for the while loop is simply true. Because true is always, well… true, I’m intentionally creating an infinite loop. The if condition checks the result_count value of the json_response hash before proceeding. If its value is 0, it exits the while loop, prints the specified string to the console (identified by the puts keyword), and returns the @results array.

If the value of result_count of the json_response hash is not 0, then it proceeds with the while loop. “What’s the while loop doing,” you innocently wonder. Well, a few things.

First, it adds the value of the result_count key of the json_response hash of the value of @result_count (which is 0 at the start). This is used to track the number of results we’ve gathered at this point in the method’s execution. This will come up again in a minute.

Next, the loop appends the value of the json_response hash to the end of the @results array that’s created when the get_results method is first called.

Then, the value of the skip attr_accessor is set to the value of @result_count. I told you we’d come back to this. The skip attr_accessor is used to tell the API to skip the first however many results to the submitted query. This is the crux of the solution I devised to the “how many total results” problem.

Finally, the loop calls self.query again (with the new skip value), prints an “#” (it’s called an “octothorpe”, by the way) to the screen (as a sort of progress bar for the user), and the whole Rube Goldberg machine begins again.

As the if condition implies, the while loop will perform this sequence of events until the number of returned results is 0. This effectively (and quickly) gets all the results matching the specified criteria and returns the @results array.

So now I had the search results from the API but I still had to get them into a CSV. There’s a method for that two, buddy. The build_results_hash method below looks a bit complicated but honestly isn’t.

def build_results_hash(results)

    @npi_array = []

    results.each do |result|

      result.each do |nested_result|

        @npi_hash = {}

        @npi_hash['npi'] = nested_result['number']
        @npi_hash['last_updated'] = nested_result['basic']['last_updated']
        @npi_hash['status'] = nested_result['basic']['status']
        @npi_hash['credential'] = nested_result['basic']['credential']
        @npi_hash['first_name'] = nested_result['basic']['first_name']
        @npi_hash['middle_name'] = nested_result['basic']['middle_name']
        @npi_hash['last_name'] = nested_result['basic']['last_name']
        @npi_hash['name'] = nested_result['basic']['name']
        @npi_hash['gender'] = nested_result['basic']['gender']

        nested_result['addresses'].each do |address|

          address['address_purpose'] == "LOCATION" ? key_prefix = 'location_' : key_prefix = 'mailing_'

          @npi_hash["#{key_prefix}telephone_number"] = address['telephone_number']
          @npi_hash["#{key_prefix}fax_number"] = address['fax_number']
          @npi_hash["#{key_prefix}address_1"] = address['address_1']
          @npi_hash["#{key_prefix}address_2"] = address['address_2']
          @npi_hash["#{key_prefix}city"] = address['city']
          @npi_hash["#{key_prefix}state"] = address['state']
          @npi_hash["#{key_prefix}postal_code"] = address['postal_code']
        
        end

        nested_result['taxonomies'].each do |taxonomy|

          if taxonomy['primary'] == true

            @npi_hash['taxonomy_code'] = taxonomy['code']
            @npi_hash['taxonomy_description'] = taxonomy['desc']

          end

          @npi_array.push(@npi_hash)
        
        end

      end

    end

    return @npi_array.uniq! { |value| value['npi'] }

  end

Essentially, it creates an empty array (@npi_array), creates an empty hash (@npi_hash), gets the desired values out of the results hash that’s passed to it, and appends the @npi_hash to the end of the @npi_array. The @npi_arrary.uniq! { |value| value['npi'] } line is simply removing any duplicate values in the @npi_array.

This methods looks a bit whacky (refactor, anyone?) but it’s honestly not that complicated. Now to get @npi_array into a CSV.

Step 4: Save that CSV!

We’re in the home stretch, kids! For my purposes, I decided to create a second file to do all the user interaction and actually save the results to the CSV. Again, the source code can be found in my GitHub. I won’t bore you with the user interaction stuff so we’ll get right to the CSV business.

Like Net::HTTP and JSON, there’s a library for the CSV format. And like the other libraries I discussed, it can be included with the line below at the top of your code.

require 'csv'

Below is the snippet of the new Ruby file that’s the business end of the CSV action.

print "Enter the full path to the destination CSV file: "

filename = gets.chomp

CSV.open(filename, "w") do |csv_file|

  puts "Saving results to the specified file. Please wait...\n"

  sleep(0.5)

  csv_file << @search_results.first.keys

  @search_results.each do |hash|

    csv_file << hash.values

  end

  puts "Results successfully saved to '#{filename}'."

end

So what’s going on here. It asks the user to enter a filename to which the API will be saved. It then opens the specified file in “write” mode with the CSV.open(filename, "w") do |csv_file| line. It prints the following line to the console and pauses for 0.5 seconds.

Saving results to the specified file. Please wait...

After 0.5 seconds have passed, it iterates through the keys of the @search_results array (as it’s holding the contents of the @npi_array) and sets the headers of the CSV file. It then iterates through the values of the @search_results array and inserts each value into the file. After writing the last value found in the “@search_results” array, it saves and closes the file. It then prints the following line to the console.

"Results successfully saved to '#{filename}'."

From the user’s perspective this program asks for a zip code, searches the API, asks for a filename for their new CSV, and saves it. Easy, cheesy…

The Take Away (Not the NPR Show)

To underscore the point I made in the first paragraph of the last post, building and testing this program took about four hours or so. Was it more efficient then getting a person to do it? You betcha! In the four zip codes that concerned us it returned about 1400 results with all the data we cared about. I don’t have the first idea how long that would take a person to accomplish manually but I figure far more than it took to build this program. The program also has reusability. Now that it exists, I could search any given zip code (or any other available criteria with a bit of refactoring) in minutes.

Leave a message below with any comments or questions or, conversely, hit me up on the Tweeters. Also coming soon(-ish): Streaming on Twitch!

Until next time, take care!