mike-obrien.net Resume Blog Labs
Tuesday, October 10, 2006

I recently had the opportunity to work with a 2d area scanner (aka Bar Code Reader) in a .NET application. Previously our client used proprietary software that just plain stunk so we decided to write our own software. One very cool thing I discovered while working on this project was that many USB devices (Like bar code scanners, joysticks, game pads, mice, keyboards, etc) sport a Human Interface Device (HID) interface. This interface communicates using the standard HID protocol defined by the USB Implementers Forum. The cool thing about this is that one standard device driver can be written and used for all USB HID devices on a particular platform. Normally you would have to write your own device driver or use an existing one to communicate with a USB device, ouch! USB HID eliminates that requirement. Windows ships with a standard USB HID device driver and automatically makes the HID interface available when a USB HID device is plugged into the system. Nothing to install, no device driver to write, yay! Windows also has a fairly easy to use API (Although a little quirky) for communicating with HID devices. This site, run by Jan Axelson, has a lot of good information on communicating with USB HID devices. His book, USB Complete, is also an invaluable resource, I would definitely recommend it if you will be doing HID development on any platform.

I have an HID .NET library available here if you want to skip most of the gory details and get right down to business. I have used this library in production and have not had any issues, please let me know if you have any feedback. Using this library, while encapsulating most of the underlying details, will require you to understand the format of data contained in the payload of the reports sent to and from the device. HID’s communicate using "reports" which are analogous to packets in network terminology. The reports themselves are a standard USB HID data structure although the payload of these reports will be specific to the device or device type. Usually the payload data format can be found in the device SDK and/or by simply analyzing the payload data the device sends.

The following library, developed for the HHP IT4600/IT4800 area scanner, uses the aforementioned HID library.

IT4XXXScanner.zip (1.98 MB)
Tuesday, October 10, 2006 8:15:40 PM (GMT Standard Time, UTC+00:00)  #   |  Comments [4]  | 

Have you ever wanted to embed an HTTP server into your application? For example let’s say you have a Windows service that you want to administer or monitor (Google desktop is a good example of this). Or let’s say you have an application that requires an alternate interface such a kiosk application. Wouldn’t it be great to have an embedded HTTP server that would allow you to provide a web interface for your application without requiring IIS and without requiring you to write a web server? If so then Lomez is your friend!

Lomez is an in-process HTTP server that allows you to create virtual directories, serve static content from a file or embedded resource and expose any object by implementing the HTTPHandler interface. It loosely resembles ASP.NET to allow you to use it with ease.

Ok, enough of the spiel… Basically I have a number of applications I work with that I would love to have HTTP access to, especially Windows services. I have looked high and low but have not found a library to allow me to do this. So at that point I figured I would have to develop something myself and Lomez was born. I have successfully used this library in a production kiosk framework (At 2 tradeshows so far) and it has worked like a champ. The current version of the library can be found here. I would love to hear any feedback you might have on it.

I consider the current version more of a proof of concept and would really like to rewrite it from the ground up. So hopefully I can begin doing that soon. I would really like to include basic authentication, challenge/response and SSL in the next version. I would also like to optimize the HTTP pipeline so it is lean -n- mean. It’s definitely not right now.

Anyways, if you end up on this page because you are looking for an in-process HTTP server for .NET please let me know your thoughts. And if you decide to use the library, let me know what you think of it.

Thanks!

The Management

Tuesday, October 10, 2006 7:04:45 PM (GMT Standard Time, UTC+00:00)  #   |  Comments [1]  | 

Interesting TechEd presentation on SaaS by Gianpaolo Carraro.

Tuesday, October 10, 2006 4:34:04 PM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  | 

So I needed to setup build, dev and staging servers on a limited budget. Getting 3 new servers in a small company is not always feasible as you probably already know. The solution? Microsoft Virtual Server 2005. Although I was familiar with the concept of running virtual servers I had not actually done it before (Shame on me!). So I was very surprised with just how easy it was to setup and configure MSVS! The configuration is so intuitive that I literally didn’t read a word of documentation (Which really appealed to my male, "I don’t need to read the directions" side… ;-)) and had a virtual server up and running in no time. The other thing that really surprised me was performance. I’m running Windows Server 2003 Standard on a 2.0 GHz AMD Athlon XP 2400+ box with 1 GB of memory (Not a very beefy box). The host system is acting as the build server and two virtual servers are running as a development and staging servers (Both running Svr 2003). I’m very surprised how "zippy" the virtual instances run.

Our network admin had a great suggestion; first create a template server which can be copied for new instances. That way you can create new instances with out having to start from scratch. When you create the new instance you need to give Windows a unique SID. Sysinternals has a free SID changer tool called NewSID which will do this. You can then take it a step further and create instance specific copies. For example I can get our dev and staging server’s setup exactly how I want them and make a copy of those instances. Then if either one of those servers gets hosed I can wax them and copy out a fresh version literally within a few minutes.

MSVS installs a net driver that allows all instances to share a single NIC. From what I have read, each instance is assigned a random MAC address. The NIC is put into promiscuous mode and packets are filtered and routed to the virtual servers.

All in all I'm very impressed with MSVS. And I'm pretty hyped about how easy it is to setup complex test and R&D environments with minimal hardware. I would be curious to see how it compares with VMWare. I guess I'll save that for another weekend... ;-)

Microsoft offers Virtual Server 2005 as a free download here.

Tuesday, October 10, 2006 3:31:10 PM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  | 
Thursday, October 05, 2006

Some interesting peices on interviewing developers:

http://www.joelonsoftware.com/articles/fog0000000073.html

http://www.hanselman.com/blog/WhatGreatNETDevelopersOughtToKnowMoreNETInterviewQuestions.aspx

Among the many good points Joel makes, one that really struck me was on "illeagal questions" to ask an interviewee. I think some are pretty obvious but I think one really needs to be careful about questions asked in an attempt to take an interest in the person and/or be friendly. For example I love to learn about and understand different cultures, and most people love to talk about their culture and where they are from. So I usually have no qualms about engaging people in this type of conversation. But on an interview this is off limits since it could be taken, not as friendly conversation, but as discriminatory questions regarding race.The following FAQ covers these issues.

http://www.eeoc.gov/facts/qanda.html

Thursday, October 05, 2006 3:14:34 PM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  | 

Joel Spolsky (Who runs Fog Creek Software) compiled a very insightful 10 item hit list on how to write better code. A lot of good points. I especially like the points about quiet working conditions and one step builds... Check it out:

http://www.joelonsoftware.com/articles/fog0000000043.html

Wednesday, October 04, 2006 11:30:57 PM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  | 
Wednesday, October 04, 2006

When making schema changes to a DB in SQL Server 2005 Management Studio you may receive the following error for long running operations:

"Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding."

Then you’ll probably go check out the execution timeout only to find that it’s set to be unlimited (As I did… ;-)). Then you’ll probably search the web to find why this doesn’t make a difference and find that there is timeout override hiding out under the “Designers” node (The LAST place I would have thought to look!). Now the problem is if you uncheck the timeout override it will not actually remove the override (Bug?). So you have to actually explicitly set the timeout.

Wednesday, October 04, 2006 3:21:57 PM (GMT Standard Time, UTC+00:00)  #   |  Comments [6]  | 
Monday, September 25, 2006

I have been running into this problem lately where I try to create a VSS source control binding on a project that already has a binding (Albeit an incorrect one). When I try to bind the project I get this error:

"One or more of the project's destination paths are invalid. To open this solution, you must specify valid paths for each project. The error was "The Web at 'http://localhost/yada-yada-yada' already contains a project. You need to choose another location for your Web project.".

So I have found a solution in a number of blogs/forums that seems to work. Basically you need to remove the source control information for the project in question in the solution file. The lines will look like this:

SccProjectUniqueName1 = http://localhost/SolutionName/SolutionName.csproj
SccLocalPath1 = .
CanCheckoutShared = true
SccProjectEnlistmentChoice1 = 2

Then decrement the SccNumberOfProjects value by 1. Once this has been done, manually reset the binding and your golden!

Monday, September 25, 2006 4:24:36 PM (GMT Standard Time, UTC+00:00)  #   |  Comments [1]  | 
Saturday, September 23, 2006

Recently I was looking up some information on enhancing query performance and came accross a new term; Sargable. A "contraction of Search ARGument" according the wikipedia. Basically sargability means that the DB server can take advantage of an index to speed up the query. WHERE clauses that arent sargable can cause a table scan because every value, in certain referenced column(s), in the entire table must be evaluated. One example would be MyValue=LTRIM(SomeTable.SomeValue). The result of the LTRIM is not indexed, therefore every value, in the SomeValue column, in the SomeTable table must be evaluated. If possible, you can rework a clause so that DB server can take advantage of an index. For example, instead of saying Year(ThisDate)=Year(GETDATE()) you could say ThisDate BETWEEN YearBegin(GETDATE()) AND YearEnd(GETDATE()). Where YearBegin and YearEnd are deterministic functions that return the first and last day of the year respectively. Or instead of saying WHERE DATEDIFF(day, SomeDateField, GETDATE()) > 7 you could say WHERE SomeDateField < DATEADD(day, -7, GETDATE()). In the first statement, every record must be evaluated in order to see if the condition is true. But in the second statement the current date is processed with the DATEADD function to give is a max date and the evaluation is done with this constant result (Since the DATEADD on GETDATE will only be run once when the query starts) which could then take advantage of an index. If the clause cannot be reworked to be sargable then you could create an indexed computed column.

Saturday, September 23, 2006 4:08:13 AM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  | 

I have found it very useful to have date tables to perform date range queries. For example if you want to show total sales per month over the last 12 months. You can base your range off of the months over the last 12 months from your month date table. Doing a left join will allow you to return a record for each month even if there are no invoices for that month.

Date tables also give you an easy way of determining related values such as the next month begin date without having to calculate them on the fly. When you are dealing with many records, those dynamic calculations can have a definite performance impact.

The following SQL script creates a date table for the days, months and years between the start date and end date specified at the beginning of the script.

BuildDateTables.zip (.7 KB)

So in the previous example we could compile the sales summary as follows:

SELECT M.[Month], ISNULL(SUM(INV.Amount),0) AS TotalSales
FROM Months AS M LEFT OUTER JOIN Invoice AS INV ON INV.Date BETWEEN M.[Month] AND M.
NextMonthBegin
WHERE M.[Month] BETWEEN DATEADD(month, -12, GETDATE()) AND GETDATE
()
GROUP BY M.
[Month]
ORDER BY M.[Month]
DESC

Saturday, September 23, 2006 3:31:22 AM (GMT Standard Time, UTC+00:00)  #   |  Comments [1]  | 

Sometimes when reading binary files, integers will be stored in the big endian format, especially if the file or the application creating it was initially built on a Sun or Mac machine. Intel systems normally store integers in little endian format. Thus the .NET framework does not have built in functionality to read in big endian integers (As of v1.1). One simple solution is to read the integer into a byte array and use the Array.Reverse function to simply reverse the order of the bytes. Then use the BitConverter class to turn the bytes into an integer. For example:

Dim Data As Byte()
Dim Value As Long

Redim Data(3) '--> 32 bit integer

'--> Big endian value of 360
'--> Equals Hx0168
Data(0) = 0 '--> Hx0 = 0, Most significant byte
Data(1) = 0 '--> Hx0 = 0
Data(2) = 1 '--> Hx01 = 1
Data(3) = 104 '--> Hx68 = 104, Least significant byte

Array.Reverse(Data)

'--> Now it looks like this, little endian style:
'--> Data(0) = 104, Most significant byte
'--> Data(1) = 1
'--> Data(2) = 0
'--> Data(3) = 0, Least significant byte

Value = BitConverter.ToInt32(Data, 0)

'--> Value = 360

You can read more about "endianness" here.

Saturday, September 23, 2006 2:11:05 AM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  | 

Here is a little trick that will “strip” (In reality zero out) the date or time portion from a datetime value. Basically the SQL datetime is just two 4 byte integers. When you convert a datetime to a float the integer portion represents the date (Which is the number of days since 1/1/1900) and the fractional portion represents the time (The number of 3 millisecond increments that have elapsed since midnight). To zero out one or the other simply requires removing the integer or fractional portion then converting that back into a datetime. The following illustrates this:

DECLARE @DateTime float

SET @DateTime = CAST(GETDATE() AS FLOAT)

/* Date & Time */
SELECT CAST(@DateTime AS datetime) AS DateAndTime

/* Date only */
SELECT CAST(FLOOR(@DateTime) AS datetime) AS DateOnly

/* Time only */
SELECT CAST((@DateTime - FLOOR(@DateTime)) AS datetime) AS TimeOnly


I think this is much cleaner than the common method of converting the datetime to a varchar, truncating the date or time portion, then converting the varchar back to a datetime. You can read more about the datetime data type here in the BOL.

Saturday, September 23, 2006 1:56:45 AM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  | 

The library found here allows you to obtain the physical coordinates of a zip code. This can be useful when dealing with weather or map related services that require a latitude and longitude as input but where the user would be providing a zip code. This small library (With a footprint of about 650k) contains the zip code to coordinate mapping as a binary resource embedded in the DLL, so no additional files or interaction with a web service is required. The data contained in the library was obtained through the US Census Bureau and is current as of the year 2000. If a zip code is not contained in the data the nearest match is returned.

Saturday, September 23, 2006 1:00:06 AM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  | 
Tuesday, September 19, 2006

Apparently you cannot set the IDENTITY flag for a column by simply using an ALTER TABLE/ALTER COLUMN in SQL Server 2000 (Not sure about 2005). I spent some time roaming around BOL and couldn’t seem to figure out how to do it. From what I have been reading Enterprise Manager has to jump through some hoops to accomplish this when you set the flag though the GUI.

Tuesday, September 19, 2006 4:26:36 PM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  | 

Here are some good SaaS blog's I have come across:

Gianpaolo's SaaS Blog: http://blogs.msdn.com/gianpaolo/

Fred Chong's SaaS Blog: http://blogs.msdn.com/fred_chong/

Brad's SaaS Blog: http://saasmaan.blogspot.com/

 

Tuesday, September 19, 2006 1:17:33 AM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  | 
Monday, September 11, 2006

When architecting a SaaS there are two ends of the spectrum when it comes to a data strategy; a shared multi-tenant schema where there is one database in which all tenants reside, then on the other end there is the isolated multi-tenant approach where each tenant has his/her own database in which schema can vary. In the whitepaper “Multi-Tenant Data Architecture”, the SaaS guys at MS describe the options as a “continuum” between these two sides (Which include a number of other important factors but I will focus on the data strategy in this entry). I think that’s a great way to describe the architecture options, especially since both ends of the spectrum can be mixed to provide an optimal solution. In my analysis of the options I don’t see either end by itself as being optimal. Let’s look at a few of the pros and cons (This list is by no means all inclusive!):

  1. Shared
    1. Pros
      1. Ability to update the schema in one place and affect all tenants.
      2. Only one instance to support, less personnel.
      3. Only requires one server (Unless vertically partitioning), less hardware.
      4. Excellent economy of scale!
    2. Cons
      1. Can only update the schema in one place and thereby affect all tenants. (This is both a pro and a con!)
      2. Difficulty restoring individual tenant data.
      3. Can only scale up.
  2. Isolated
    1. Pros
      1. Ability to update tenant schema selectively.
      2. Ability to easily restore individual tenant data.
      3. Ability to easily scale out.
    2. Cons
      1. Must update hundreds or thousands of instances to implement a schema change.
      2. Many instances to support on many servers, more personnel.
      3. Many servers to support, a lot of hardware.
      4. Poor economy of scale (Relative to Shared multi-tenant.)

Like I said, definitely not all inclusive but does represent some of the major pro’s and cons of each end of the spectrum. A little more about these issues:

  1. Schema updates – The shared approach is great when applying schema updates, once and you’re done! This is especially nice when a bug is discovered or when delivering a new version. With an isolated approach the schema must now be applied to hundreds or even thousands of instances! Ouch, no fun at all! On the other hand, what if you are releasing a new version but only want to expose a few clients to it, not all of them? For the most part (Depending on the types of changes made) this is not possible in a shared environment. Everybody gets the changes. And if it turns out that there are major issues with this version it will affect all clients.
  2. Software support – In a shared environment software support is very simple (Relatively speaking… ;-)). Schema changes are applied to one instance, there is only one version of the schema to support, bug fixes are relatively straight forward. This requires less people to support. The isolated approach on the other hand has hundreds or even thousands of database instances and schema variations to support. Bug fixes are not as straight forward since not all schemas might be affected by a particular bug. Developing new versions of the application can be tricky since there can be many variations to the schema which the application will have to account for. On the other hand an isolated approach does allow you the latitude to make custom changes to the schema and accommodate individual tenant needs.
  3. Hardware support – Shared environment, one server, one database (Unless vertically partitioning), a lot less hardware to support. Bad thing is, for the most part, you can only scale up which can seriously limit your ability to handle more tenants or resource hungry tenants. Isolated environment, many databases (Possibly hundreds or thousands) and servers to support. This means more hardware and more support personnel. BUT, you can scale out very easily. A resource hungry tenant can be put on his/her own server and as more tenants are added more hardware can be added allowing you to have more customers.
  4. Restores – Shared, the restored data must be merged in with existing data. This can be difficult, especially if the system was not designed, with the ability to do this, in mind from inception. Isolated is obviously very simple (For the most part) since it will more than likely just require a single database restore.

In an attempt to get all the pro’s and as few of the cons as possible, what if we used a hybrid approach. I will call it Shared Assignable. Basically the system and schema are architected from inception to be shared but support multiple databases which can contain one or more tenants. Tenants can be assigned to database instances through provisioning. This thought is described in the MS whitepaper mentioned previously under “Tenant-Based Horizontal Partitioning” as a scaling technique for a shared database (I would highly recomend reading the entire whitepaper!). Shared Assignable would have the following characteristics:

  1. Support multiple tenants in one database – Each database instance can support one or more tenants. This follows the shared tenant, single databse paradigm.
  2. Support multiple database instances – By means of meta-data, the system will support one or more database instances on one or more servers. Tenants are assigned to an existing database instance or to a new one during the provisioning process.
  3. Support multiple application instances – By means of meta-data a particular version of the application can be tied to a particular database instance. This enables us to run one version of the application for tenants in a particular database instance and another version for another instance. This is important for allowing limited release of a new version of the application.
  4. Support abstract data and schema identification – The data and schema will follow an identification pattern (Similar to implementing an interface on a class) that ties records and schema elements to an individual tenant. This allows the system to identify tenants and also to perform ETL operations on individual clients without being required to understand the full extent of the schema. For example, at a basic level, every schema element would begin with a “MyApp_” and every table will be required to have a “ClientId” column. In the event a tenant must be moved into another database instance for performance or size reasons, the ETL process can simply enumerate tables that begin with “MyApp_” and records that have a “ClientId” of 123. These can then be moved to the new instance without the ETL process having prior understanding of the schema other than the table prefix and “ClientId” column name. These ETL operations can be written once and not have to be updated to handle schema variances between instances or versions.

This approach allows you to group tenants much the same way you would group applications in application pools in IIS 6. For example let’s say you have 1000 tenants. The following factors determining their usage (Again, not all inclusive):

  1. Feature exploitation
  2. Number of concurrent users
  3. Usage time
  4. Data size

As such, they can be classified as follows:

  1. Lightweight – Exploit a small number of features, low number of concurrent users, use the system for a small amount of time each day and do not have much data. Data can reside in a shared environment.
  2. Middleweight Shared– Exploit many of the features, moderate number of concurrent users, use the system during business hours, moderate amount of data. Data can reside in a shared environment.
  3. Middleweight Isolated – Same as above but data cannot be shared because of privacy and/or performance and disk space reasons.
  4. Heavyweight – Exploit all features and drive new features, high number of concurrent users, heavy usage around the clock because of global staff, large quantity of data. Data can not be shared because of privacy and/or performance and disk space reasons.


With that classification let’s say that 800 of those tenants are lightweight, 195 are middleweights 145 of which can be in a shared database and 50 that cannot. And 5 are heavyweights. The isolated approach would produce 1000 separate instances. Since we are obviously “catching the long tail” as seen with the tenant numbers above, we would seriously be loosing economy of scale with the isolated approach. Plus since more than 80% of our clients have very low hardware requirements we would be throwing away those resources which are cutting into profit margins and could be focused on the heavyweights. Schema updates would be expensive as well. The shared approach would be challenging too, schema updates would be a cinch and we would have great economy of scale but our higher end clients could potentially scuttle our hardware as a result of their resource requirements. The Shared Assignable architecture however would in many ways give us the best of both worlds. The following illustrates how we could architect our SaaS data with the above scenario:

 

In this architecture we are appropriately distributing our resources so as to make the most of them. We are giving more to those who need it and less to those who don’t. We are also not creating a high maintenance environment for smaller tenants which do not require it. In this example we are spreading 60 databases across 8 servers. In a purely isolated environment we would be dealing with 1000 databases over many servers. In a strictly shared environment we are dealing with one server, one huge database. Both scenarios would be very difficult to maintain and scale.

When creating a new tenant our provisioning process will give us the ability to add the tenant to an existing database instance or create a new one.  This will allow us to judiciously place the tenant according to their needs and our architecture. Gianpaolo Carraro made an excellent observation in regards to service levels in this type of environment. By giving the provisioning process the ability to provision a tenant in a shared database or isolated database, on a shared server or isolated server we are enabling a pricing structure for the service. A silver package could be a shared instance, a gold package could be an isolated instance on a shared server and a platinum package could be an isolated instance on a standalone server. This allows you to offer a package that best suites the customer and recover costs for the services rendered and resources used.

Using meta-data we can also tie databases instances with a particular application instance. For example let’s say that when we make a major release we only want to release it to one instance in our middleweight pool. The meta-data can direct requests for tenants in that database instance to use the beta version of the application. All other tenants would use the stable version. This mapping would only be available per database instance since the schema is tied to a particular application version.

Lastly, it is key that tenant data be easily movable thus allowing it to be "Assignable". It is inevitable that we will want to move tenants around as needs change and as the tenant base grows. This can be easily achieved by implementing an abstract identification schema on all database objects owned by a tenant. One way to do this is define object prefixes. For example all tables in our app could be prefixed with a “MyApp_Client_”. This could tell any ETL process in our system that this is a “My App” table that contains client records. Secondly we will need to include a client identification column in all client tables, for example “ClientId. Obviously this will be present by virtue of the fact that we are using a shared schema but it will need to follow a set convention in all tables. In other words it needs to have the same column name and data type across all client tables. In an ETL process that moves a tenant for example, we could provide the tenant we want to move and the server/database we want to move it too. The ETL process could then grab all tables that begin with “MyApp_Client_” in the source database and all the records in those tables that have a ClientId of xyz. Our ETL process could then move these records into the target database. In the case where the system allow additional columns to be added to a table, the ETL process can dynamically detect these from the source table and add them to the target table. It is also noteworthy that the base data schema be metadata driven to allow the provisioning process to create new databases either for a new tenant or one that is moved into a new database. Plus it is required for schema upgrades. But that’s a whole other can of worms… ;-)

Monday, September 11, 2006 8:30:17 PM (GMT Standard Time, UTC+00:00)  #   |  Comments [4]  | 
Thursday, September 07, 2006

Never realized this but in order to implement one or more interfaces on an interface in VB.NET you use the Inherits keyword, not the Implements keyword...

Namespace LineChart

   Public Interface IRange

      Inherits IEnumerable
      Inherits IDisposable

      ReadOnly Property Count() As Integer
      ReadOnly Property Items() As RangeItem()

   End Interface

End Namespace

Thursday, September 07, 2006 4:32:57 PM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  |